*Generates a monthly series with info from all medical claims for sample
*Version 15 Stata

set more off

*******************************************************************************
*NOTE: Overview of do-file

*Step 1: clean up Facility claims- collapse to member-month level,
*	and then merge Pharmacy data at member-month level with Facility data
*	at memb-month level. IMPORTANT: Need to run ad_transition_adults to merge
*	the appropriate coverage backbone, violations, etc.
*Step 2: supplementary info from Facility (define hospitalization and secondary diagnosis)
*Step 3: merge the monthly pharmacy costs of ALL pharmacy claims (not only antipressants as in "ad_transition_adults.do") 
*		with the main series from Step 2 
*Step 4: clean up Professional claims (from PROFESSIONAL_SERVICE table in SQL)- collapse to member-month level 
*		and then merge with the main series from Step 3
*Step 5: get zip code and county info for each member and save a dataset

* --> Final datasets: "adults_sample_series_merged_All.dta" and "uniq_member_adults_sample_zip.dta"

*******************************************************************************


*Step 1: Facility claims 
********************************************************************************
*NOTE: first round of Facility data focus on allowed cost amount and ER visits (defined from admission type)
*Before merging Pharmacy and Facility, run "ad_transition_adults.do" (except for the program generating transition matrices) 
*		to clean raw pharmacy claims of Antidepressants, collapse them to member-month level,
*		and then merge with NPI data (prescribers' info) and guideline violations on transitions. 

*Bring in raw data
import delimited using "FACILITY_adults_sample_ad.csv", clear varnames(1)

*Fix service date
gen serv_yr=substr(first_serv_dt,1,4)
gen serv_month=substr(first_serv_dt,6,2)
gen serv_day=substr(first_serv_dt,9,2) 
destring serv_yr, replace
destring serv_month, replace
destring serv_day, replace

gen month_code=12*(serv_yr==2014)+24*(serv_yr==2015)+ ///
                     36*(serv_yr==2016)+serv_month

gen day_code=month_code*100+serv_day
sort memb_bid_str day_code
order memb_bid_str month_code day_code serv_*,first

*save file
save FACILITY_adults_sample_ad.dta, replace

*Collapse to member-month level: total costs, no. Facility claims, no. ER visits, no. ER for MH reasons 
use FACILITY_adults_sample_ad.dta, clear

* no. facility header; for mental health reasons
bysort memb_bid_str month_code: gen n_facility=_N
gen i_mh_any=(i_mh_primary==1 | i_mh_admit==1) // about 8% 
bysort memb_bid_str month_code: egen n_facility_mh=sum(i_mh_any)

* cost
bysort memb_bid_str month_code: egen facility_cost=sum(allwd_amt)
gen temp=allwd_amt*i_mh_any
bysort memb_bid_str month_code: egen facility_cost_mh=sum(temp)
drop temp

* admit types (ER, urgent, elective)
gen i_admit_ER=(admit_tp=="1 ")
gen i_admit_urgent=(admit_tp=="2 ")
gen i_admit_elective=(admit_tp=="3 ")

local list ER urgent elective
foreach x of local list{
	bysort memb_bid_str month_code: egen n_`x'=sum(i_admit_`x')
	gen temp=i_admit_`x'*i_mh_any
	bysort memb_bid_str month_code: egen n_`x'_mh=sum(temp)
	drop temp 
}

bysort memb_bid_str month_code: gen unique=_n
keep if unique==1 // 2,324,798 obs at member-month level
keep memb_bid_str month_code n_* facility_cost* 

save monthly_FACILITY_adults_sample_ad.dta, replace

****************************************************************************
*IMPORTANT!!! HERE, YOU WILL USE THE DATA CREATED IN STEP 2 OF AD_TRANSITION_ADULTS.DO AND MERGE ON THE FACILITY FILE JUST CREATED!
****************************************************************************
*bring in coverage backbone with antidepressant information
use adults_sample_series_merged.dta, clear
merge 1:1 memb_bid_str month_code using monthly_FACILITY_adults_sample_ad.dta
rename _merge merge_ad_facil 

foreach var of varlist n_facility- n_elective_mh{
	sum `var' if merge_ad_facil==1
	replace `var'=0 if merge_ad_facil==1
}

drop order* memb_bid phrmcy_clm_bid prev_NPI1 prev_NPI2 prev_NPI3

sort memb_bid_str month_code
order memb_bid_str month_code,first
save "adults_sample_series_merged_All.dta"

*Step 2: Supplementary info from Facility - Hospitalization, and secondary ICD9
********************************************************************************
*(1) hospitalization defined through category of service (inpatient);
*(2) secondary ICD9 codes to identify more ER/hosp for mental health reasons.

*NOTE: Merge the supplementary data with "adults_sample_series_merged_All.dta" in the 
*		beginning of "reg-patient-physician.do"

*Supp (1): define hospitalization (count number of hospitalizations in a given month from Facility claims)
use FACILITY_adults_sample_ad.dta, clear
gen i_hosp=substr(ctgy_of_serv,1,2)=="IP" 
gen i_mh_any=(i_mh_primary==1 | i_mh_admit==1)
gen i_hosp_mh=(i_hosp==1 & i_mh_any==1) 

bysort memb_bid_str month_code: egen n_hosp=sum(i_hosp)
bysort memb_bid_str month_code: egen n_hosp_mh=sum(i_hosp_mh) 

bysort memb_bid_str month_code: gen uniq=_n
keep if uniq==1
keep memb_bid_str month_code n_hosp n_hosp_mh
save monthly_FACILITY_adults_sample_ad_supp.dta, replace

*Supp (2): secondary diagnosis (we found a lot more ER/hospitalization for mental health reasons from secondary diagnosis codes!)
* Import second diagnosis 
import delimited using FACILITY_second_ICD_adults_sample_ad.csv, clear
order facil_hdr_bid_str
drop facil_hdr_bid memb_bid 

gen i_mh_second=0 // indicator for whether secondary diagnosis is related to Mental Health (first 3 digits of ICD9 bet. 290 and 319)

*Check whether of the 10 second diagnosis, 
foreach var of varlist sec*{
	replace `var'=subinstr(`var'," ","",.) 
	replace i_mh_second=1 if inlist(substr(`var',1,3),"290","291","292","293","294","295","296","297","298")| ///
					inlist(substr(`var',1,3),"300","301","302","303","304","305","306","307","308") | ///
					inlist(substr(`var',1,3),"310","311","312","313","314","315","316","317","318") | ///
					inlist(substr(`var',1,3),"299","309","319")
}

tab i_mh_second
drop sec* 

*Merge with facility claims! (note the second ICD9 all come from another table, and the linked variable is the unique ID for facility claims.
merge 1:1 facil_hdr_bid_str using "FACILITY_adults_sample_ad.dta"
drop _merge

* admit types (ER, urgent, elective)
gen i_admit_ER=(admit_tp=="1 ")
gen i_admit_urgent=(admit_tp=="2 ")
gen i_admit_elective=(admit_tp=="3 ")

* category of service (IP - hospitalization)
gen i_hosp=substr(ctgy_of_serv,1,2)=="IP" 

* MH in secondary ICD but not primary 
gen i_mh_only_second=(i_mh_second==1 & i_mh_primary==0 & i_mh_admit==0) 

local list admit_ER admit_urgent admit_elective hosp
foreach x of local list{
* no. ER/hosp with MH in secondary ICD but not primary 
	gen temp2=i_`x'*i_mh_only_second
	bysort memb_bid_str month_code: egen n_`x'_mh2=sum(temp2)
	drop temp2
}

bysort memb_bid_str month_code: gen uniq=_n
keep if uniq==1 //2,324,798 obs
keep memb_bid_str month_code n_admit_ER_mh2 n_admit_urgent_mh2 n_admit_elective_mh2 n_hosp_mh2
save monthly_FACILITY_adults_sample_ad_supp2.dta, replace

*Step 3: Bring in cost of ALL Pharmacy claims, not just antidepressants
********************************************************************************
import delimited using "PHARM_adults_sample_ad_all_cost.csv", varnames(1) clear
save "PHARM_adults_sample_ad_all_cost.dta"

*pull of antidepressants
*use "PHARM_adults_sample_ad_all_cost.dta", clear
*keep if i_ad==1
*save PHARM_adults_sample_ad.dta, replace

*Fix service date
gen serv_yr=substr(serv_dt,1,4)
gen serv_month=substr(serv_dt,6,2)
gen serv_day=substr(serv_dt,9,2) 
destring serv_yr, replace
destring serv_month, replace
destring serv_day, replace

gen month_code=12*(serv_yr==2014)+24*(serv_yr==2015)+ 36*(serv_yr==2016)+serv_month
					 
*Collapse to member-month level
sort memb_bid_str month_code
bysort memb_bid_str month_code: gen unique=_n
bysort memb_bid_str month_code: gen n_pharm=_N
bysort memb_bid_str month_code: egen pharm_cost=sum(allwd_amt) 

keep if unique==1 
count //  11,371,917 
keep memb_bid_str month_code n_pharm pharm_cost

** Merge with series
merge 1:1 memb_bid_str month_code using adults_sample_series_merged_All.dta
rename _merge merge_any_pharm_cost
count // 14,433,962

foreach var of varlist n_facility- n_elective_mh{
	sum `var' if merge_any_pharm_cost==1
	replace `var'=0 if merge_any_pharm_cost==1
}
foreach var of varlist n_pharm pharm_cost{
	sum `var' if merge_any_pharm_cost==2
	replace `var'=0 if merge_any_pharm_cost==2
}

save "adults_sample_series_merged_All.dta",replace

*Step 4: Professional claims to member-month level
********************************************************************************
*import delimited using "PROF_adults_sample_ad_ANY.csv", clear
*save "PROF_adults_sample_ad_ANY.dta", replace

use PROF_adults_sample_ad_ANY.dta, clear

*Collapse to monthly level 
sort memb_bid_str month_code
bysort memb_bid_str month_code: gen unique=_n
bysort memb_bid_str month_code: gen n_prof=_N
bysort memb_bid_str month_code: egen n_prof_mh=sum(i_mh1) 
bysort memb_bid_str month_code: egen prof_cost=sum(allwd_amt)
gen temp=allwd_amt*i_mh1
bysort memb_bid_str month_code: egen prof_cost_mh=sum(temp) 

keep if unique==1
keep memb_bid_str month_code n_prof n_prof_mh prof_cost prof_cost_mh

save monthly_PROF_adults_sample_ad_ANY.dta, replace

** Merge with series
merge 1:1 memb_bid_str month_code using adults_sample_series_merged_All.dta
rename _merge merge_prof_cost
count

foreach var of varlist n_facility- n_elective_mh n_pharm pharm_cost{
	sum `var' if merge_prof_cost==1
	replace `var'=0 if merge_prof_cost==1
}

foreach var of varlist n_prof n_prof_mh prof_cost prof_cost_mh{
	sum `var' if merge_prof_cost==2
	replace `var'=0 if merge_prof_cost==2
}

*** some quick stats
count 
sum pharm_cost prof_cost* facility_cost* 
gen total_cost=pharm_cost+prof_cost+facility_cost 

* some members are no longer considered in the _ad sample
bysort memb_bid_str: gen uniq0=_n
count if uniq0==1 
gen temp=(molecule_final!=.) 
bysort memb_bid_str: egen check=sum(temp)
count if uniq0==1 & check>0 

gen in_ad_sample=(check>0) 
drop temp check uniq0 

sort memb_bid_str month_code
order memb_bid_str month_code total_cost,first

*** additional labels of variables
label var month_code "range from 1 to 45 (1- Jan 2013; 45 - Sept 2016)"
label var total_cost "total allowed amount in current month"
label var n_prof "no. professional claims"
label var n_prof_mh "no. professional claims with Mental Health diagnosis (ICD9)"
label var prof_cost "nominal professional cost in current month"
label var prof_cost_mh "nominal professional cost rel. to mental health in current month"
label var min_month_code "first month of enrollment at BCBS; 0 if before Jan2013 (Memb_Eligibility)"
label var serv_month "calendar month of service 1-12"
label var pharm_cost "nominal pharmacy claims cost in current month"
label var n_pharm "no. pharmacy claims in current month"
label var prscrb_prov_id_cd "BCBS's provider ID for prescribers"
label var npi_prscrb_prov_id_cd "NPI of prescribers"
label var days_supply_cnt "days of supply of a prescription (Pharmacy)"

label var prod_serv_id_cd_str "drug NDC code in string" 
label var productNDC_str "9-digit NDC code for matching BCBS prescriptions with NDC database"
label var productid "long drug ID from NDC"
label var proprietaryname "from NDC"
label var nonproprietaryname "from NDC"
label var substancename "molecule (from NDC database)"
label var productndc_str0 "raw format of 8 or 9 digit NDC"
label var locate "for manipulation of NDC code"
label var locate2 "for manipulation of NDC code"

label var molecule_final "final version of coding of molecule (including no-drug or other or cocktail)"
label var prev_molecule_final "previous molecule_final"
label var i_ad2 "1 if prescribed an antidepressant in current month; non-missing over the treatment period"
label var first_month "First month of observed script of antidepress"
label var max_ad_month "Last month of observed script of antidepress"
label var npi_str0 "NPI of matched prescribers (string))"
label var next_NPI "NPI of next prescriber (string)"
label var prev_npi_str "NPI of previous prescriber (string)"
label var change_prescriber "1 if changing prescriber (by NPI)"
label var npi0 "NPI of matched prescribers (numeric)"

label var memb_female "1 if member is female"
label var brth_yr "birth yr of member"
label var min_covrg "min month of coverage by BCBS (Member-Eligibility)"
label var max_covrg "max month of coverage by BCBS (Member-Eligibility)"
label var zip_cd "member's zip code"
label var st_cd "member's state code"
label var memb_county "member's county code"
label var serv_yr "year of service (from claims)"

order memb_bid_str memb_female brth_yr min_covrg max_covrg st_cd memb_county zip_cd month_code serv_yr serv_month,first

label var npi_str "NPI of prescriber(string), used in matching"
label var i_npi_str "unique numeric ID for each prescriber (NPI) within this dataset"
label var raw_str "raw description of primary taxonomy/specialty, from NPI"
label var level0 "level0 descript. of primary taxonomy"
label var level1 "level1 descript. of primary taxonomy"
label var level2 "level2 descript. of primary taxonomy"
label var i_physician "1 if physician, by primary taxonomy"
label var i_lag_phy "lagged 1 if physician"
label var i_psych "1 if Psychiatrist, by primary taxonomy"
label var i_GP "1 if GP, by primary taxonomy"
label var sum_i_psych "total no. office visits to Psychiatrists"

label var n_facility "no. facility claims" 
label var n_facility_mh "no. facility claims related to mental health (ICD9)" 
label var facility_cost "nominal cost of faclity claims"
label var facility_cost_mh "nominal cost of faclity claims-Mental Health"
label var n_ER "no. ER visit"
label var n_ER_mh "no. ER - Mental Health"
label var n_urgent "no. urgent care"
label var n_urgent_mh "no. urgent care - Mental Health"
label var n_elective "no. elective visits"
label var n_elective_mh "no. elective visits - Mental Health"
label var n_hosp "no. inpatient/hospitalization"
label var n_hosp_mh "no. inpatient/hospitalization - Mental Health"
label var i_ER "1 if Any ER"
label var i_ER_mh "1 if ER - Mental Health"
label var i_hosp "1 if Any hospitalization"
label var i_hosp_mh "1 if hospitalization - Mental Health"
label var i_ER_hosp "1 if Any ER/hospitalization"
label var i_ER_hosp_mh "1 if ER/hospitalization - Mental Health"

label var real_total_cost "real cost in Jan-2013 dollars"
label var ln_total_cost "ln(total cost in Jan-2013 dollars)"
label var non_pharm_cost "non-Pharmacy cost (current month) in Jan-2013 dollars"
label var ln_non_pharm "ln(non-Pharmacy cost (current month) in Jan-2013 dollars)"
label var cost1 "real cost in month (t+1)"
label var cost2 "real cost in month (t+2)"
label var three_month_cost "3-month cost in Jan-2013 dollars"
label var ln_three_month_cost "ln(3-month cost in Jan-2013 dollars)"
label var three_month_non_pharm "3-month non-Pharmacy cost in Jan-2013 dollars"
label var ln_three_month_non_pharm "ln(3-month non-Pharmacy cost in Jan-2013 dollars)"

*save final data--this is used for all regressions!
save "adults_sample_series_merged_All.dta",replace

*Step 5:  zip code of member --> county 
********************************************************************************
*Bring in raw data
import delimited using "member_adults_sample_zip.csv", clear varnames(1)
rename memb_zip zip_cd
merge m:1 zip_cd using "reference_zip.dta" 
keep if _merge==3
drop _merge
keep memb_bid_str mbrs_crnt_prim_zip_cd zip_cd st_cd st_nm cnty_fips_cd st_fips_cd

save "member_adults_sample_zip.dta"

** unique to member level 
bysort memb_bid_str: gen uniq=_n
bysort memb_bid_str st_fips_cd cnty_fips_cd : gen check=_n
count if uniq==1
count if check==1

replace st_fips_cd=51 if st_nm=="Virginia"
tab st_nm if st_fips_cd==. 
replace st_fips_cd=1 if st_nm=="Alabama"
replace st_fips_cd=6 if st_nm=="California"
replace st_fips_cd=12 if st_nm=="Florida"
replace st_fips_cd=13 if st_nm=="Georgia"
replace st_fips_cd=17 if st_nm=="Illinois"
replace st_fips_cd=21 if st_nm=="Kentucky"
replace st_fips_cd=27 if st_nm=="Minnesota"
replace st_fips_cd=39 if st_nm=="Ohio"
replace st_fips_cd=40 if st_nm=="Oklahoma"
replace st_fips_cd=42 if st_nm=="Pennsylvania"
replace st_fips_cd=48 if st_nm=="Texas"

destring cnty_fips_cd,replace
count if st_fips==.
count if st_fips==. & cnty_fips==.

gen memb_county=st_fips*1000+cnty_fips
replace memb_county=0 if memb_county==.

keep if uniq==1 
keep memb_bid_str zip_cd st_cd memb_county
save "uniq_member_adults_sample_zip.dta", replace


